Parameter exploration

Purpose: Run the simulation with varying parameters and characterize the effects of those parameters

Parameter exploration can either be done as grid search where a multidimensional "regular" grid of parameter values is explored or as random search where random parameters values are picked.

psyrun tutorial


In [1]:
from __future__ import print_function
from pprint import pprint

This tutorial will introduce you to the "psyrun" tool for parameter space exploration and serial farming step by step. It also integrates well with "ctn_benchmarks". We will use the CommunicationChannel as an example.


In [2]:
from ctn_benchmark.nengo.communication import CommunicationChannel

Parameter space exploration

Let us try to find out how the RMSE of the communication channel varies with the number of dimensions and number of neurons. Usually you would write a bunch of nested for loops like this:


In [3]:
rmses = []
for D in np.arange(2, 5):
    for N in [10, 50, 100]:
        rmses.append(CommunicationChannel().run(D=D, N=N)['rmse'])


running CommunicationChannel#20160328-152630-11e3af7f
Simulation finished in 0:00:01.                                                 
_D = 2
_L = 2
_N = 10
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.16572583026875315
sim_speed = 7.043639184918233
running CommunicationChannel#20160328-152630-4d65aacb
Simulation finished in 0:00:01.                                                 
_D = 2
_L = 2
_N = 50
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.070963177680149134
sim_speed = 8.502611008400637
running CommunicationChannel#20160328-152630-4d65aacb
Simulation finished in 0:00:01.                                                 
_D = 2
_L = 2
_N = 100
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.028531004425303052
sim_speed = 8.638787279617729
running CommunicationChannel#20160328-152630-4d65aacb
Simulation finished in 0:00:01.                                                 
_D = 3
_L = 2
_N = 10
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.012725007735396695
sim_speed = 9.304754087450362
running CommunicationChannel#20160328-152630-09efe4b2
Simulation finished in 0:00:01.                                                 
_D = 3
_L = 2
_N = 50
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.017750297895821786
sim_speed = 8.73140287694902
running CommunicationChannel#20160328-152630-09efe4b2
Simulation finished in 0:00:01.                                                 
_D = 3
_L = 2
_N = 100
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.027605168028911279
sim_speed = 8.48161741659033
running CommunicationChannel#20160328-152631-09efe4b2
Simulation finished in 0:00:01.                                                 
_D = 4
_L = 2
_N = 10
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.16662160881813956
sim_speed = 10.729041004783465
running CommunicationChannel#20160328-152631-09efe4b2
Simulation finished in 0:00:01.                                                 
_D = 4
_L = 2
_N = 50
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.087142299358396944
sim_speed = 8.984726643331777
running CommunicationChannel#20160328-152631-09efe4b2
Simulation finished in 0:00:01.                                                 
_D = 4
_L = 2
_N = 100
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.023944215909647099
sim_speed = 8.744710084042374

In [4]:
pprint(rmses)


[0.16572583026875315,
 0.070963177680149134,
 0.028531004425303052,
 0.012725007735396695,
 0.017750297895821786,
 0.027605168028911279,
 0.16662160881813956,
 0.087142299358396944,
 0.023944215909647099]

However there are a few annoyances with this approach: We have to add another for loop for each dimensions in the parameter space; and we have to handle storing the results ourselves. Here, especially, the assignment to input parameters in only implicit.

With psyrun, we can define a parameter space in a more natural way:


In [5]:
from psyrun import Param

pspace = Param(D=np.arange(2, 5)) * Param(N=[10, 50, 100])
print(pspace)


Param(D=[2, 2, 2, 3, 3, 3, 4, 4, 4], N=[10, 50, 100, 10, 50, 100, 10, 50, 100])

We use the Param class to define sets of parameters with keyword arguments to the constructor. As long as all sequences have the same length it is possible to give multiple arguments to the constructor:


In [6]:
print(Param(a=[1, 2], b=[3, 4]))


Param(a=[1, 2], b=[3, 4])

If at least one argument is a sequence, non-sequence arguments will generate lists of the required length:


In [7]:
print(Param(a=[1, 2], b=3))


Param(a=[1, 2], b=[3, 3])

Such parameters definitions support basic operations like a cartesian product (as shown above), concatenation with +, and two subtraction like operations with - and the psyrun.pspace.missing function.

Once we defined a parameter space we want to explore, it is easy to apply a function (i.e. run our model) to each set of parameters.


In [8]:
import psyrun

result = psyrun.map_pspace(CommunicationChannel().run, pspace)


running CommunicationChannel#20160328-152631-09efe4b2
Simulation finished in 0:00:01.                                                 
_D = 2
_L = 2
_N = 10
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.16572583026875315
sim_speed = 9.76084634413682
running CommunicationChannel#20160328-152631-4d65aacb
Simulation finished in 0:00:01.                                                 
_D = 2
_L = 2
_N = 50
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.070963177680149134
sim_speed = 9.590115169072405
running CommunicationChannel#20160328-152631-4d65aacb
Simulation finished in 0:00:01.                                                 
_D = 2
_L = 2
_N = 100
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.028531004425303052
sim_speed = 9.921217898528957
running CommunicationChannel#20160328-152632-4d65aacb
Simulation finished in 0:00:01.                                                 
_D = 3
_L = 2
_N = 10
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.012725007735396695
sim_speed = 10.659266205662675
running CommunicationChannel#20160328-152632-09efe4b2
Simulation finished in 0:00:01.                                                 
_D = 3
_L = 2
_N = 50
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.017750297895821786
sim_speed = 9.910996533529302
running CommunicationChannel#20160328-152632-09efe4b2
Simulation finished in 0:00:01.                                                 
_D = 3
_L = 2
_N = 100
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.027605168028911279
sim_speed = 8.251111478325189
running CommunicationChannel#20160328-152632-09efe4b2
Simulation finished in 0:00:01.                                                 
_D = 4
_L = 2
_N = 10
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.16662160881813956
sim_speed = 10.275069695885861
running CommunicationChannel#20160328-152632-09efe4b2
Simulation finished in 0:00:01.                                                 
_D = 4
_L = 2
_N = 50
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.087142299358396944
sim_speed = 8.868312774603872
running CommunicationChannel#20160328-152632-09efe4b2
Simulation finished in 0:00:01.                                                 
_D = 4
_L = 2
_N = 100
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.023944215909647099
sim_speed = 7.7395821223813455

In [9]:
pprint(result)


{'D': [2, 2, 2, 3, 3, 3, 4, 4, 4],
 'N': [10, 50, 100, 10, 50, 100, 10, 50, 100],
 'rmse': [0.16572583026875315,
          0.070963177680149134,
          0.028531004425303052,
          0.012725007735396695,
          0.017750297895821786,
          0.027605168028911279,
          0.16662160881813956,
          0.087142299358396944,
          0.023944215909647099],
 'sim_speed': [9.76084634413682,
               9.590115169072405,
               9.921217898528957,
               10.659266205662675,
               9.910996533529302,
               8.251111478325189,
               10.275069695885861,
               8.868312774603872,
               7.7395821223813455]}

The map_pspace function conveniently returns a dictionary with all the input parameters and output values. This makes it easy to convert it to other data structurs like a Pandas data frame for further analysis:


In [10]:
import pandas as pd
pd.DataFrame(result)


Out[10]:
D N rmse sim_speed
0 2 10 0.165726 9.760846
1 2 50 0.070963 9.590115
2 2 100 0.028531 9.921218
3 3 10 0.012725 10.659266
4 3 50 0.017750 9.910997
5 3 100 0.027605 8.251111
6 4 10 0.166622 10.275070
7 4 50 0.087142 8.868313
8 4 100 0.023944 7.739582

Parallelizaton

It would be nice to parallelize all these simulations to make the best use of multicore processors (though, your BLAS library might be parallelized already). This is easy with psyrun as we just have to use a different map function. The only problem is that the function we want to apply needs to be pickleable which requires us to put it into an importable Python module. (It is also possible to switch the parallelization backend from multiprocessing to threading, but that can introduce other issues.)


In [11]:
%%sh
cat << EOF > model.py
from ctn_benchmark.nengo.communication import CommunicationChannel
def evaluate(**kwargs):
    return CommunicationChannel().run(**kwargs)
EOF

In [12]:
from model import evaluate
result2 = psyrun.map_pspace_parallel(evaluate, pspace)


running CommunicationChannel#20160328-152633-09efe4b2
running CommunicationChannel#20160328-152633-09efe4b2
running CommunicationChannel#20160328-152633-09efe4b2
running CommunicationChannel#20160328-152633-09efe4b2
running CommunicationChannel#20160328-152633-09efe4b2
running CommunicationChannel#20160328-152633-09efe4b2
running CommunicationChannel#20160328-152633-09efe4b2
running CommunicationChannel#20160328-152633-09efe4b2
running CommunicationChannel#20160328-152633-09efe4b2
Simulation finished in 0:00:01.                                                 
Simulation finished in 0:00:01.                                                 
Simulation finished in 0:00:01.                                                 
Simulation finished in 0:00:01.                                                 
Simulation finished in 0:00:01.                                                 
Simulation finished in 0:00:01.                                                 
Simulation finished in 0:00:01.                                                 
Simulation finished in 0:00:01.                                                 
Simulation finished in 0:00:01.                                                 
_D = 2
_L = 2
_N = 50
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.070963177680149134
sim_speed = 4.957144022134159_D = 2
_L = 2
_N = 10
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.16572583026875315
sim_speed = 4.187779439633052_D = 3
_L = 2
_N = 50
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.017750297895821786
sim_speed = 6.460906186554772_D = 2
_L = 2
_N = 100
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.028531004425303052
sim_speed = 4.271733440339595_D = 3
_L = 2
_N = 10
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.012725007735396695
sim_speed = 4.16351067504735_D = 4
_L = 2
_N = 50
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.087142299358396944
sim_speed = 5.09186174545693_D = 4
_L = 2
_N = 100
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.023944215909647099
sim_speed = 4.6014242098114915_D = 3
_L = 2
_N = 100
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.027605168028911279
sim_speed = 4.232498062516146_D = 4
_L = 2
_N = 10
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.16662160881813956
sim_speed = 4.321293862744472









In [13]:
pprint(result2)


{'D': [2, 2, 2, 3, 3, 3, 4, 4, 4],
 'N': [10, 50, 100, 10, 50, 100, 10, 50, 100],
 'rmse': [0.16572583026875315,
          0.070963177680149134,
          0.028531004425303052,
          0.012725007735396695,
          0.017750297895821786,
          0.027605168028911279,
          0.16662160881813956,
          0.087142299358396944,
          0.023944215909647099],
 'sim_speed': [4.187779439633052,
               4.957144022134159,
               4.271733440339595,
               4.16351067504735,
               6.460906186554772,
               4.232498062516146,
               4.321293862744472,
               5.09186174545693,
               4.6014242098114915]}

Serial farming

After doing some initial experiments, you might want to run a larger number of simulations. Because all the simulations are independent from one another, they could run one after another on your computer, but that would take a very long time. It would be faster to distribute the individual simulations to a high performance cluster (like Sharcnet) with many CPU cores that can run the simulations in parallel. This is called serial farming.

To do serial farming, first make sure that it is a Python module that can be installed, is installed, and can be imported. (It is possible to change sys.path if you don't want to install it as a Python module, but this approach is more complicated and error prone.) Then you need to define a "task" for what you intend to run. To define a task you have to create a file task_<name>.py in a directory called psy-tasks. Here is an example task for our communication channel:


In [14]:
# task_cc1.py

from ctn_benchmark.nengo.communication import CommunicationChannel
import numpy as np
import psyrun
from psyrun import Param

pspace = Param(D=np.arange(2, 5)) * Param(N=[10, 50, 100])
min_items = 1
max_jobs = 6

def execute(**kwargs):
    return CommunicationChannel().run(**kwargs)

There are a number of variables with a special meaning in these kind of task files. pspace defines the parameter space to explore which should be familar by now. execute is the function that gets called for each assignment of parameters and returns a dictionary with the results. max_jobs says that a maximum of 6 processing jobs should be created (a few other jobs not doing the core processing might be created in addition).

When running this task (we will get to that in a moment), the parameter space will be split into parts of the same size and assigned to the processing jobs to be processed. At the end all the results from the processing jobs will be merged into a single file. min_items defines how many parameter assignments should at least be processd by each job.

To run tasks defined in this way, we use the psy-doit command. As the name suggests, this tool is based on the excellent doit automation tool. Be sure that your working directory contains the psy-tasks directory when invoking this command.

First we can list the available tasks:


In [20]:
%%sh
psy-doit list


cc1   
cc2   

Let us test our task first:


In [16]:
%%sh
psy-doit test cc1


cc1
running CommunicationChannel#20160328-152634-1b7add91
Simulation finished in 0:00:01.                                                 
_D = 2
_L = 2
_N = 10
_pstc = 0.01
_T = 1.0
_backend = 'nengo'
_dt = 0.001
_seed = 1
_hide_overlay = False
_gui = False
rmse = 0.16572583026875315
sim_speed = 10.075970288371915

This immediatly runs the first (and only the first) parameter assignment as a way to check that nothing crashes. Now let us run the full task.


In [17]:
%%sh
psy-doit cc1


.  cc1:split
.  cc1:process:0
.  cc1:process:1
.  cc1:process:2
.  cc1:process:3
.  cc1:process:4
.  cc1:process:5
.  cc1:merge

This produces standard doit output with dots (.) indicating that a task was executed. As you can see the first task is splitting the parameter space and then several processing jobs are started and in the end the results are merged.

All intermediary files for this will be written to the psy-work/<task name> directory by default. This is also where we find the result file. psyrun provides some helper functions to load this file:


In [18]:
data = psyrun.NpzStore().load('psy-work/cc1/result.npz')
pprint(data)


{'D': array([2, 2, 2, 3, 4, 3, 3, 4, 4]),
 'N': array([ 10, 100,  50, 100,  10,  10,  50,  50, 100]),
 'rmse': array([ 0.16572583,  0.028531  ,  0.07096318,  0.02760517,  0.16662161,
        0.01272501,  0.0177503 ,  0.0871423 ,  0.02394422]),
 'sim_speed': array([  9.86115448,   9.05615532,   9.2535156 ,   8.22138503,
        10.00549618,  10.20564068,   9.70149143,   9.1584289 ,   9.24154737])}

Per default NumPy .npz files are used to store this data. However, this format has a few disadvantages. For example, to append to such a file (which happens in the merge step), the whole file has to be loaded into memory. Thus, it is possible to switch to HDF5 which is better in this regard. Just add the line store = psyrun.H5Store() to your task file.

Note that this will require pytables to be installed and that the Sharcnet pytables seems to be broken at the moment.

Speaking of Sharcnet: Currently psy-doit runs the jobs sequantially and immediatly. To actually get serial farming we have to tell psyrun to invoke the right job scheduler. For Sharcnet this would be sqsub. Here is the updated task file:


In [19]:
# task_cc2.py

import platform

from ctn_benchmark.nengo.communication import CommunicationChannel
import numpy as np
from psyrun import Param, Sqsub

pspace = Param(D=np.arange(2, 5)) * Param(N=[10, 50, 100])
min_items = 1
max_jobs = 6

sharcnet_nodes = ['narwhal', 'bul', 'kraken', 'saw']
if any(platform.node().startswith(x) for x in sharcnet_nodes):
    workdir = '/work/jgosmann/tcm'
    scheduler = Sqsub(workdir)
    scheduler_args = {
        'timelimit': '10m',
        'memory': '512M'
    }

def execute(**kwargs):
    return CommunicationChannel().run(**kwargs)

First of all, I check whether I am actually on a Sharcnet node before I do the Sharcnet specific settings. This is not required, but allows me to run the same file on my local computer. (Note that the list of Sharcnet nodes contains only the ones I typically use. You might want to use a different list).

workdir is a special variable to set the location for intermediary and result files. I'm changing the location because on Sharcnet /work offers more storage space and a better performance. The essential line to change to the sqsub scheduler is scheduler = Sqsub(workdir) and then we finally add some scheduler specific arguments. For sqsub you have to specify a runtime and memory usage limit.

Now you can run psy-doit cc2 on Sharcnet and it will submit a number of jobs and return. Use sqjobs to check the status of your jobs and see when they are finished.

Sometimes jobs crash. If that the case run psy-doit cc2 again and it should figure out by itself what jobs have to be resubmitted.

Load balancing

Sometimes the parameters can significantly influence the runtime of your models. That can be problematic on Sharcnet because if you estimate the time limit too low your jobs will get killed before completion, but a high time limit will push your job back in the queue. In those cases it might be good to do some simple load balancing.

If you insert backend = psyrun.LoadBalancingBackend, psyrun will submit max_jobs jobs and each job will process as many parameters as it can, always grabbing the next in queue, until it gets killed. Unfortunately, the parameters assignment of a killed job will not be reentered into the queue. Thus, after all jobs have completed, you have to check the results file for completeness. If results are missing reexecute psy-doit <task name>. This will only process parameters assignments for missing results.

It is best to use the load balancing backend with the H5Store, because it does a lot more appending to files which is inefficient with the NpzStore. Also note, that the load balancing backend is less tested as it is newer.


In [ ]: